Many applications ranging from machine learning, image processing, and machine vision to optimization utilize matrix\nmultiplication as a fundamental block. Matrix operations play an important role in determining the performance of such\napplications. This paper proposes a novel efficient, highly scalable hardware accelerator that is of equivalent performance to\na 2GHz quad core PC but can be used in low-power applications targeting embedded systems requiring high performance\ncomputation. Power, performance, and resource consumption are demonstrated on a fully-functional prototype. The proposed\nhardware accelerator is 36Ã?â?? more energy efficient per unit of computation compared to state-of-the-art Xeon processor of equal\nvintage and is 14Ã?â?? more efficient as a stand-alone platform with equivalent performance. An important comparison between\nsimulated system estimates and real system performance is carried out.
Loading....